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Method and device for video scene composition from varied data 



The present invention relates to a method of composing an MPEG-4 video 
scene content at least from a first set of input video objects coded according to the MPEG-4 
standard, said method comprising a first decoding step for generating a first set of decoded 
MPEG-4 video objects from said first set of input video objects, and a rendering step for 
generating composed frames of said video scene from at least said first set of decoded 
MPEG-4 video objects. 

This invention may be used, for example, in the field of digital television 
broadcasting and implemented in a set top box as an Electronic Program Guide (EPG). 



The MPEG-4 standard relative to system aspects, referred to as 
ISO/IEC 14496-1, provides functionality for multimedia data manipulation. It is dedicated to 
scene composition containing different natural or synthetic objects, such as two-or three- 
dimensional images, video clips, audio tracks, texts or graphics. This standard allows scene 
content creation usable with multiple applications, allows flexibility in object combination, 
and offers means for user interaction in scenes containing multiple objects. This standard 
may be used in a communication system comprising a server and a client terminal via a 
communication link. In such applications, MPEG-4 data exchanged between both sets are 
streamed on said communication link and used at the client terminal to create multimedia 
applications. 

The international patent application WO 00/01 154 describes a terminal and 
method of the above kind for composing and presenting MPEG-4 video programs. This 
terminal comprises: 

a terminal manager for managing the overall processing tasks, 

decoders for providing decoded objects, 

a composition engine for maintaining, updating, and assembling a scene graph of the 
decoded objects, and 

a presentation engine for providing a scene for presentation. 
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It is an object of the invention to provide a cost-effective and optimized 
method of video scene composition that allows the composition of an MPEG-4 video scene 
simultaneously from video data coded according to the MPEG-4 video standard referred to as 
5 ISO/IEC 14496-2 and video data coded according to other video standards. The invention 
takes the following aspects into consideration. 

The composition method according to the prior art allows the composition of a 
video scene from a set of decoded video objects coded according to the MPEG-4 standard. 
To this end, a composition engine maintains and updates a scene graph of the current objects, 
.JO including their relative positions in a scene and their characteristics, and provides a 
«3 corresponding list of objects to be displayed to a presentation engine. In response, the 
a3 presentation engine retrieves the corresponding decoded object data stored in respective 
%i composition buffers. The presentation engine renders the decoded objects for providing a 
lu scene for presentation on a display. 

1 1 5 With the widespread use of digital networks such as the Internet, most 

p multimedia applications resulting in a video scene composition collect video data from 
m different sources to enrich their content. In this context, if this prior art method is used for a 
H video scene composition, collected data not compliant with the MPEG-4 standard could not 

be rendered, which would lead to a poor video scene content or produce an error in the 
20 applications. Indeed, this prior art method is very restrictive since the video scene 

composition can exclusively be performed from video objects coded according to the 
MPEG-4 system standard, which excludes the use of other video data in the video scene 
composition, such as MPEG-2 video data. 

To circumvent the limitations of the prior art method, the method of video 
25 scene composition according to the invention is characterized in that it comprises : 

a) a second decoding step for generating a set of decoded video data from a second set of 
input video data not MPEG-4 compliant. 

b) a video object creation step for generating a second set of video objects, each created 
video object being formed by the association of a decoded video data extracted from said 

30 set of decoded video data and a set of properties for defining characteristics of said 

decoded video data in the video scene, said second set of video objects being rendered 
jointly with said first set of decoded MPEG-4 video objects during said rendering step. 
This allows a rendering of all the input video objects in the scene so as to 
result in an MPEG-4 video scene. Indeed, it becomes possible to create and render an 
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enriched video scene from MPEG-4 video objects and video objects not compliant with the 
MPEG-4 standard. 

The association of properties to video objects not compliant with the MPEG-4 
standard being cost-effective in terms of processing means the invention can be used in cost- 
effective products such as consumer products. 

These and other aspects of the invention will be apparent from and elucidated 
with reference to the embodiments described hereinafter. 

The particular aspects of the invention will now be explained with reference to 
the embodiments described hereinafter and considered in connection with the accompanying 
drawings, in which identical parts or sub-steps are designated in the same manner : 

Fig.l depicts the different functional blocks of the MPEG-4 video scene 
composition according to the invention, 

Fig.2 depicts the hardware implementation of the MPEG-4 video scene 
composition method according to the invention, 

Fig.3 depicts an embodiment of the invention. 

The invention allows a video scene composition from input video streams 
encoded according to the MPEG-4 standard and input video streams coded according to other 
video standards different from the MPEG-4 standard. It is described for the case in which 
said video streams coded according to other video standards different from the MPEG-4 
standard correspond to video streams coded according to the MPEG-2 video standard, but it 
would be apparent to those skilled in the art that this invention may also be used with other 
standards such as H.263, MPEG-1, or a proprietary company format. 

Fig.l shows the different functional blocks of the video scene composition 
according to the invention. 

The method of scene composition according to the invention comprises the 
following functional steps : 

1 . a first decoding step 1 0 1 for decoding an input video stream 1 02 containing input video 
objects coded according to the MPEG-4 video standard. This decoding step 101 results 
in decoded MPEG-4 video objects 103. If the input video stream 102 corresponds to a 
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demultiplexed video stream or comprises a plurality of elementary video streams, each 
elementary video stream is decoded by a separate decoder during the decoding step 101 ; 

2. a second decoding step 104 for decoding an input video stream 105 containing input 
coded video data not coded according to the MPEG-4 video standard, but coded, for 
example, according to the MPEG-2 video standard. This decoding step results in 
decoded MPEG-2 video data 106. If the input video stream 105 corresponds to a 
demultiplexed video stream or comprises a plurality of elementary video streams, each 
elementary video stream is decoded by a separate decoder during the decoding step 104. 

3. a video object creation step 107 for generating video objects 108 from said decoded 
MPEG-2 video data 106. This step consists in associating with each decoded video data 
106 a set of properties defining its characteristics in the final video scene. Each data 
structure, linked to a given video data 106, comprises for example : 

a) a field "depth" for defining the depth of said video data in the video scene (e.g. 
first ground or second ground), 

b) a field "transform" for defining a geometric transform of said video data (e.g. a 
rotation characterized by an angle), 

c) a field "transparency" for defining the transparency coefficient between said video 
data and other video objects in the video scene. 

In this way, the resulting video objects 108 are compatible with MPEG-4 video objects 103 
in the sense that each video object 108 not only contains video frames but also refers to a set 
of characteristics allowing its description in the video scene. 

4. a rendering step 109 for assembling the video objects 103 and 108. To this end, the 
video objects 103 and 108 are rendered by using their own object properties, or by 
using object properties (filled during the video object creation step 107, for video 
objects 103) contained in a BIFS stream 1 1 1 (Binary Format for Scene), said BIFS 
stream 1 1 1 containing a scene graph description describing each object properties in 
the scene. The assembling order of video objects is determined by the depth of each 
video object to be rendered : the video objects composing backgrounds are assembled 
first, then the video objects composing foregrounds are finally assembled. This 
rendering results in the delivery of an MPEG-4 video scene 110. 

As an example, in an electronic program guide (EPG) allowing a viewer to 
browse TV programs, this method may be used for composing a video scene from an 
MPEG-2 video stream 105 and an MPEG-4 video stream 102, said MPEG-2 video stream 
105 defining, after decoding 104, a full screen background MPEG-2 video, while said 
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MPEG-4 video stream defines, after decoding 101, a first object MPEG4_video_objectl 
corresponding to a video of reduced format (used as a TV preview, for example) and a 
second object MPEG4_video_object2 corresponding to textual information (used as time and 
channel indications). 

5 The rendering of these three video elements is made possible by the 

association of a set of properties Scene_video_object3 with the decoded MPEG-2 video in 
order to define the characteristics of this MPEG-2 video in the video scene, this association 
resulting in the video object MPEG4_video_object3 . The two decoded MPEG-4 objects, are 
each associated, according to the MPEG-4 syntax relative to scene description, with a set of 
10 properties Scene_video_objectl (and Scene_video_object2) in order to define their 

O characteristics in the video scene. These two sets Scene_video_objectl and 

10 Scene_video_object2 may be filled by pre-set parameters or by parameters contained in the 
BIFS stream 1 1 1 . In this latter possibility, the composed scene may be real-time updated, 

-P especially if the BIFS update mechanism, well know to those skilled in the art, is used, which 

rjfl5 allows to change the characteristics of video objects in the scene. 

J 5 : In each video object structure, a structure Buffer video is also defined for 

Mi accessing video data, i.e. video frames, by three pointers pointing to respective components 
%j Y, U and V of each video data. For example, the component Y of the video object 1 is 
Hf accessed by pointer pt_videol_Y, while the components U and V are accessed by pointers 
20 pt_video 1 _U and pt_video 1 _V, respectively. 
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The corresponding scene graph has the following structure : 

Scene_graph { 

MPEG4_video_object1 { 

Scene_video_object1 { 

depthl 
transform 1 
transparencyl 
} 

Buffer_video1 { 

pt_video1_Y 
pt_video1_U 
pt_video1_V 
} 

} 

MPEG4_video_object2 { 

Scene_video_object2 { 

depth2 
transform2 
transparency2 
} 

Buffer_video2 { 

pt_video2_Y 
pt_video2_U 
pt_video2_V 
} 

} 

MPEG2_video_object3 { 

Scene_video_object3 { 

depth3 
transform3 
transparency3 
} 

Buffer_video3 { 

pt_video3_Y 
pt_video3_U 
pt_video3_V 
} 

} 

} 
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The rendering step 109 first assembles the MPEG-4 objects 
MPEG4_video_objectl and MPEG4_video_object2 in a composition buffer by taking into 
consideration characteristics of the structures Scene_video_objectl and 
Scene_video_object2. Then the video object MPEG2_video_object3 is rendered along with 
previously rendered MPEG-4 objects, for which the characteristics of the structure 
Scene_video_object3 are taken into account. 

Fig.2 shows the hardware architecture 200 for implementing the different steps 
of the video scene composition according to the invention. 

This architecture is structured around a data bus 201 to ensure data exchange 
between the different processing hardware units. This architecture includes an input 
peripheral 202 for receiving MPEG-4 and MPEG-2 input video streams, which are both 
stored in the mass storage 203. 

The decoding of video streams coded according to the MPEG-4 standard is 
done with the signal processor 204 (referred to as SP in the figure) executing instructions 
relative to an MPEG-4 decoding algorithm stored in memory 205, while the decoding of 
video streams coded according to MPEG-2 is also done with the signal processor 204 
executing instructions relative to an MPEG-2 decoding algorithm stored in said memory 205 
(or an appropriate decoding algorithm if the input video stream is coded according to a video 
standard other than the MPEG-2 one). Once decoded, MPEG-4 video objects are stored in a 
first data pool buffer 206, while MPEG-2 video data are stored in a second data pool buffer 
211. 

The video rendering step is performed by the signal processor 204 executing 
instructions relative to a rendering algorithm stored in the memory 205. The rendering is 
performed in that not only decoded MPEG-4 objects but also decoded MPEG-2 data are 
assembled in a composition buffer 210. To this end, in order to avoid multiple and expensive 
data manipulation, decoded MPEG-2 data are re-copied by a signal co-processor 209 
(referred to as SCP in the Figure) directly from buffer 21 1 into said composition buffer 210. 
This re-copying ensures that a minimum computational load is used, which does not limit 
other tasks in the application such as the decoding or the rendering tasks. At the same time, 
the set of properties relative to said MPEG-2 data is filled and taken into account by the 
signal processor during the rendering step. In this way, MPEG-2 data have a similar structure 
as MPEG-4 ones (i.e. association of video data and properties), which allows the rendering of 
the total of the input video objects. Thus, the rendering takes into account not only MPEG-4 
objects properties and MPEG-2 properties, but also data relative : 
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1 . to the action of a mouse 207 and/or a keyboard 208, 

2. and/or to BIFS commands issued from a BIFS Stream stored in the storage device 203 or 
received via input peripheral 202, 

for changing the position of video objects in the video scene being built up, in dependence on 
5 the action of the viewer using the EPG. 

When a rendered frame is available in the contents of buffer 210, it is 
presented to an output video peripheral 212 for being displayed on a display 213. 

In this implementation, the processor 204 and the co-processor 209 are used 
simultaneously, so that MPEG-4 input video objects composing the next output frame of the 
10 video scene can always be decoded during the re-copying by the SCP in the composition 
p buffer of decoded MPEG-2 video data composing the current output frame of the video 
^ scene. This is made possible by the non CPU-consuming process (Clock Pulse Units) carried 
|£j out by the SCP, which allows the SP to use the full CPU processing capacity. This optimized 
s; p processing will be highly appreciated by those skilled in the art, especially in a real-time 
rJJ 5 video scene composition context where input video objects of large size, requiring high 
s computational resources, have to be processed. 

m Fig. 3 shows an embodiment of the invention. This embodiment corresponds to 

^ an electronic program guide application (EPG) allowing a viewer to watch miscellaneous 
□ information relative to TV channels programs on a display 304. To this end, the viewer 
' 20 navigates through the screen in translating, by means of a mouse-like/pointer device305, the 
browsing window 308 into a channels space 306 and a time space 307, said browsing 
window playing the corresponding video preview of the chosen time/channel combination. 
The browsing window 308 is overlaid and blended on top of a background video 309. 

The different steps according to the invention described with reference to 
25 Fig.l are implemented in a set-top box unit 301 which receives input video data from an 
outside world 302. Said input video data, in this example corresponds, for example, to 
MPEG-4 video data delivered by a first broadcaster (e.g. video objects 306-307-308) and to 
MPEG-2 video data delivered by a second broadcaster (e.g. video data 309), via a 
communication link 303. Said input video data are processed in accordance with the different 
30 steps of the invention shown in Fig.l with the use of a hardware architecture as shown in 

Fig.2, resulting in MPEG-4 video composed frames composed by the total of the input video 
objects. 
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Of course, the presented graphic designs do not restrict the scope of the 
invention, indeed, alternative graphic designs may be envisaged without deviating from the 
scope of the invention. 

There has been described an improved method of composing a scene content 
5 simultaneously from input video streams encoded according to the MPEG-4 video standard 
and from non MPEG-4 compliant video data (i.e. not coded according to the MPEG-4 
standard) such as MPEG-2 video data. The method according to the invention relies on a 
video object creation step allowing to compose an MPEG-4 video scene from said non 
MPEG-4 compliant video data thanks to the association of scene properties with said non 
10 MPEG-4 compliant video data. 

Of course, this invention is not restricted to the presented structure of scene 
properties associated to said non MPEG-4 video data, and alternative fields defining this 
structure may be considered without deviating from the scope of the invention. 

This invention may be implemented in several manners, such as by means of 
15 wired electronic circuits, or alternatively by means of a set of instructions stored in a 

computer-readable medium, said instructions replacing at least part of said circuits and being 
executable under the control of a computer, a digital signal processor or a digital signal co- 
processor in order to carry out the same functions as fulfilled in said replaced circuits. The 
invention then also relates to a computer-readable medium comprising a software module 
20 that includes computer-executable instructions for performing the steps, or some steps, of the 
method above described. 



